Dataset statistics
| Number of variables | 15 |
|---|---|
| Number of observations | 1000 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 62.7 KiB |
| Average record size in memory | 64.3 B |
Variable types
| Categorical | 9 |
|---|---|
| Numeric | 6 |
feat.e is highly correlated with feat.i | High correlation |
feat.f is highly correlated with response | High correlation |
feat.i is highly correlated with feat.e | High correlation |
response is highly correlated with feat.f | High correlation |
feat.g_x is highly correlated with feat.g_y and 1 other fields | High correlation |
feat.g_y is highly correlated with feat.g_x and 1 other fields | High correlation |
feat.g_z is highly correlated with feat.g_x and 1 other fields | High correlation |
feat.c_a is highly correlated with feat.c_b | High correlation |
feat.c_b is highly correlated with feat.c_a and 1 other fields | High correlation |
feat.c_d is highly correlated with feat.c_b | High correlation |
feat.a has unique values | Unique |
feat.e has unique values | Unique |
feat.f has unique values | Unique |
feat.h has unique values | Unique |
feat.i has unique values | Unique |
Reproduction
| Analysis started | 2022-11-22 19:51:19.627091 |
|---|---|
| Analysis finished | 2022-11-22 19:51:24.344589 |
| Duration | 4.72 seconds |
| Software version | pandas-profiling v3.4.0 |
| Download configuration | config.json |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.2 KiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1000 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 553 | |
| 0 | 447 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 1 | 553 | |
| 0 | 447 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 553 | |
| 0 | 447 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1000 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 553 | |
| 0 | 447 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 553 | |
| 0 | 447 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 553 | |
| 0 | 447 |
| Distinct | 1000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.048383598 |
| Minimum | -7.429324037 |
|---|---|
| Maximum | 10.7231198 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 353 |
| Negative (%) | 35.3% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | -7.429324037 |
|---|---|
| 5-th percentile | -3.867752929 |
| Q1 | -0.8849727281 |
| median | 1.027628916 |
| Q3 | 2.9938056 |
| 95-th percentile | 6.028401614 |
| Maximum | 10.7231198 |
| Range | 18.15244384 |
| Interquartile range (IQR) | 3.878778328 |
Descriptive statistics
| Standard deviation | 2.975084929 |
|---|---|
| Coefficient of variation (CV) | 2.837782788 |
| Kurtosis | -0.0686019667 |
| Mean | 1.048383598 |
| Median Absolute Deviation (MAD) | 1.95091297 |
| Skewness | 0.06539204332 |
| Sum | 1048.383598 |
| Variance | 8.851130336 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -0.6814269397 | 1 | 0.1% |
| 0.1177140185 | 1 | 0.1% |
| 3.307156885 | 1 | 0.1% |
| 1.362157988 | 1 | 0.1% |
| 3.590945302 | 1 | 0.1% |
| 5.141543585 | 1 | 0.1% |
| 6.898744046 | 1 | 0.1% |
| 0.9148148358 | 1 | 0.1% |
| -5.747153271 | 1 | 0.1% |
| 1.094578014 | 1 | 0.1% |
| Other values (990) | 990 |
| Value | Count | Frequency (%) |
| -7.429324037 | 1 | |
| -6.982768395 | 1 | |
| -6.929446856 | 1 | |
| -6.805099011 | 1 | |
| -6.523753407 | 1 | |
| -6.397694581 | 1 | |
| -5.927506627 | 1 | |
| -5.747153271 | 1 | |
| -5.674963089 | 1 | |
| -5.631899332 | 1 |
| Value | Count | Frequency (%) |
| 10.7231198 | 1 | |
| 9.07514201 | 1 | |
| 9.054576998 | 1 | |
| 8.726349291 | 1 | |
| 8.714374438 | 1 | |
| 8.65907834 | 1 | |
| 8.463993632 | 1 | |
| 8.374181476 | 1 | |
| 8.290679957 | 1 | |
| 8.250320061 | 1 |
feat.b
Real number (ℝ)
| Distinct | 993 |
|---|---|
| Distinct (%) | 99.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -3.941324755 |
| Minimum | -8.571791335 |
|---|---|
| Maximum | 1.085556232 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 995 |
| Negative (%) | 99.5% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | -8.571791335 |
|---|---|
| 5-th percentile | -6.482529863 |
| Q1 | -4.978745311 |
| median | -3.917721434 |
| Q3 | -2.893249761 |
| 95-th percentile | -1.60183933 |
| Maximum | 1.085556232 |
| Range | 9.657347567 |
| Interquartile range (IQR) | 2.08549555 |
Descriptive statistics
| Standard deviation | 1.506601557 |
|---|---|
| Coefficient of variation (CV) | -0.3822576547 |
| Kurtosis | -0.06023898819 |
| Mean | -3.941324755 |
| Median Absolute Deviation (MAD) | 1.048129977 |
| Skewness | -0.0231639689 |
| Sum | -3941.324755 |
| Variance | 2.269848252 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -3.917721434 | 8 | 0.8% |
| -5.493698087 | 1 | 0.1% |
| -2.481194209 | 1 | 0.1% |
| -2.667889449 | 1 | 0.1% |
| -6.909910232 | 1 | 0.1% |
| -2.465197367 | 1 | 0.1% |
| -3.99181341 | 1 | 0.1% |
| -3.145331546 | 1 | 0.1% |
| -6.479883345 | 1 | 0.1% |
| -4.99998157 | 1 | 0.1% |
| Other values (983) | 983 |
| Value | Count | Frequency (%) |
| -8.571791335 | 1 | |
| -8.042994054 | 1 | |
| -7.943988162 | 1 | |
| -7.906057256 | 1 | |
| -7.824014162 | 1 | |
| -7.693862525 | 1 | |
| -7.566110388 | 1 | |
| -7.503920998 | 1 | |
| -7.470603661 | 1 | |
| -7.376567763 | 1 |
| Value | Count | Frequency (%) |
| 1.085556232 | 1 | |
| 0.935776165 | 1 | |
| 0.7760667111 | 1 | |
| 0.224126414 | 1 | |
| 0.1960867204 | 1 | |
| -0.1340978557 | 1 | |
| -0.281881298 | 1 | |
| -0.3280029895 | 1 | |
| -0.3632662883 | 1 | |
| -0.3756889403 | 1 |
feat.d
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| 1.0 | |
|---|---|
| 0.0 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 3000 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 1.0 |
| 4th row | 1.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 1.0 | 520 | |
| 0.0 | 480 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 1.0 | 520 | |
| 0.0 | 480 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1480 | |
| . | 1000 | |
| 1 | 520 | 17.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2000 | |
| Other Punctuation | 1000 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 1480 | |
| 1 | 520 | 26.0% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 1000 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 3000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 1480 | |
| . | 1000 | |
| 1 | 520 | 17.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 1480 | |
| . | 1000 | |
| 1 | 520 | 17.3% |
| Distinct | 1000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -0.5183207529 |
| Minimum | -6.758176383 |
|---|---|
| Maximum | 5.289708782 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 596 |
| Negative (%) | 59.6% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | -6.758176383 |
|---|---|
| 5-th percentile | -3.840913833 |
| Q1 | -1.779296264 |
| median | -0.5163815105 |
| Q3 | 0.8010775699 |
| 95-th percentile | 2.774485945 |
| Maximum | 5.289708782 |
| Range | 12.04788516 |
| Interquartile range (IQR) | 2.580373833 |
Descriptive statistics
| Standard deviation | 1.984703368 |
|---|---|
| Coefficient of variation (CV) | -3.829102649 |
| Kurtosis | -0.03436963598 |
| Mean | -0.5183207529 |
| Median Absolute Deviation (MAD) | 1.285863659 |
| Skewness | -0.07150994541 |
| Sum | -518.3207529 |
| Variance | 3.939047458 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -0.8006149557 | 1 | 0.1% |
| -0.4105429274 | 1 | 0.1% |
| -2.149190977 | 1 | 0.1% |
| 0.6691611674 | 1 | 0.1% |
| -2.496597332 | 1 | 0.1% |
| -3.468563012 | 1 | 0.1% |
| 0.01555496617 | 1 | 0.1% |
| 0.3305800081 | 1 | 0.1% |
| 1.550839146 | 1 | 0.1% |
| 0.9521521358 | 1 | 0.1% |
| Other values (990) | 990 |
| Value | Count | Frequency (%) |
| -6.758176383 | 1 | |
| -6.399743569 | 1 | |
| -6.335952428 | 1 | |
| -6.178752334 | 1 | |
| -5.856328822 | 1 | |
| -5.819338759 | 1 | |
| -5.719050018 | 1 | |
| -5.473047809 | 1 | |
| -5.271585502 | 1 | |
| -5.166574755 | 1 |
| Value | Count | Frequency (%) |
| 5.289708782 | 1 | |
| 4.807481457 | 1 | |
| 4.698983411 | 1 | |
| 4.696980464 | 1 | |
| 4.628818601 | 1 | |
| 4.580737248 | 1 | |
| 4.466210225 | 1 | |
| 4.245676957 | 1 | |
| 3.971205537 | 1 | |
| 3.96399422 | 1 |
| Distinct | 1000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -6.257345469 |
| Minimum | -31.09907622 |
|---|---|
| Maximum | 21.56793583 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 789 |
| Negative (%) | 78.9% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | -31.09907622 |
|---|---|
| 5-th percentile | -19.48796709 |
| Q1 | -11.65484815 |
| median | -6.262428864 |
| Q3 | -0.912532982 |
| 95-th percentile | 6.307771472 |
| Maximum | 21.56793583 |
| Range | 52.66701205 |
| Interquartile range (IQR) | 10.74231517 |
Descriptive statistics
| Standard deviation | 8.005529545 |
|---|---|
| Coefficient of variation (CV) | -1.279381103 |
| Kurtosis | 0.1736241289 |
| Mean | -6.257345469 |
| Median Absolute Deviation (MAD) | 5.371094591 |
| Skewness | 0.02786552246 |
| Sum | -6257.345469 |
| Variance | 64.0885033 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -4.427601788 | 1 | 0.1% |
| 5.733868313 | 1 | 0.1% |
| -16.23534137 | 1 | 0.1% |
| -4.527911115 | 1 | 0.1% |
| -11.99220613 | 1 | 0.1% |
| -10.86518825 | 1 | 0.1% |
| -2.641091002 | 1 | 0.1% |
| 0.7347983651 | 1 | 0.1% |
| -2.958744506 | 1 | 0.1% |
| -10.27875464 | 1 | 0.1% |
| Other values (990) | 990 |
| Value | Count | Frequency (%) |
| -31.09907622 | 1 | |
| -30.34507874 | 1 | |
| -29.10103863 | 1 | |
| -28.74414316 | 1 | |
| -28.02887003 | 1 | |
| -25.93349585 | 1 | |
| -25.90821945 | 1 | |
| -25.61192808 | 1 | |
| -25.58897083 | 1 | |
| -25.03581109 | 1 |
| Value | Count | Frequency (%) |
| 21.56793583 | 1 | |
| 20.17426201 | 1 | |
| 19.88434253 | 1 | |
| 17.93220262 | 1 | |
| 17.77268027 | 1 | |
| 17.3905916 | 1 | |
| 14.17918456 | 1 | |
| 14.01412077 | 1 | |
| 13.32045114 | 1 | |
| 13.04330909 | 1 |
| Distinct | 1000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.03083308 |
| Minimum | 3.421248303 |
|---|---|
| Maximum | 17.43144145 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 3.421248303 |
|---|---|
| 5-th percentile | 6.692429352 |
| Q1 | 8.700144228 |
| median | 10.02830899 |
| Q3 | 11.52875894 |
| 95-th percentile | 13.25505024 |
| Maximum | 17.43144145 |
| Range | 14.01019315 |
| Interquartile range (IQR) | 2.828614715 |
Descriptive statistics
| Standard deviation | 2.022200156 |
|---|---|
| Coefficient of variation (CV) | 0.2015984257 |
| Kurtosis | 0.03984029824 |
| Mean | 10.03083308 |
| Median Absolute Deviation (MAD) | 1.421113657 |
| Skewness | -0.1302689443 |
| Sum | 10030.83308 |
| Variance | 4.089293472 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 10.25419887 | 1 | 0.1% |
| 6.754303427 | 1 | 0.1% |
| 11.24278619 | 1 | 0.1% |
| 12.68298848 | 1 | 0.1% |
| 9.352376556 | 1 | 0.1% |
| 10.28876694 | 1 | 0.1% |
| 7.836294432 | 1 | 0.1% |
| 9.988045316 | 1 | 0.1% |
| 9.277988632 | 1 | 0.1% |
| 8.493310924 | 1 | 0.1% |
| Other values (990) | 990 |
| Value | Count | Frequency (%) |
| 3.421248303 | 1 | |
| 3.475701331 | 1 | |
| 3.945908562 | 1 | |
| 4.151417736 | 1 | |
| 4.22770175 | 1 | |
| 4.51677224 | 1 | |
| 4.614397502 | 1 | |
| 4.86227061 | 1 | |
| 5.022192772 | 1 | |
| 5.217552022 | 1 |
| Value | Count | Frequency (%) |
| 17.43144145 | 1 | |
| 16.16747908 | 1 | |
| 15.85780148 | 1 | |
| 15.69748807 | 1 | |
| 14.83414122 | 1 | |
| 14.79040265 | 1 | |
| 14.62396292 | 1 | |
| 14.62363889 | 1 | |
| 14.48832726 | 1 | |
| 14.47632645 | 1 |
| Distinct | 1000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -0.5186066973 |
| Minimum | -6.763426764 |
|---|---|
| Maximum | 5.315728559 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 598 |
| Negative (%) | 59.8% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | -6.763426764 |
|---|---|
| 5-th percentile | -3.886815871 |
| Q1 | -1.773089044 |
| median | -0.5060849539 |
| Q3 | 0.8030406826 |
| 95-th percentile | 2.78705719 |
| Maximum | 5.315728559 |
| Range | 12.07915532 |
| Interquartile range (IQR) | 2.576129727 |
Descriptive statistics
| Standard deviation | 1.984378137 |
|---|---|
| Coefficient of variation (CV) | -3.826364271 |
| Kurtosis | -0.02932702066 |
| Mean | -0.5186066973 |
| Median Absolute Deviation (MAD) | 1.292620753 |
| Skewness | -0.07130431438 |
| Sum | -518.6066973 |
| Variance | 3.937756592 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -0.8280728697 | 1 | 0.1% |
| -0.3358788662 | 1 | 0.1% |
| -2.138754969 | 1 | 0.1% |
| 0.7567881697 | 1 | 0.1% |
| -2.505539361 | 1 | 0.1% |
| -3.490892139 | 1 | 0.1% |
| 0.07054932741 | 1 | 0.1% |
| 0.3639592021 | 1 | 0.1% |
| 1.607569457 | 1 | 0.1% |
| 0.9758482567 | 1 | 0.1% |
| Other values (990) | 990 |
| Value | Count | Frequency (%) |
| -6.763426764 | 1 | |
| -6.398746196 | 1 | |
| -6.376277409 | 1 | |
| -6.192309923 | 1 | |
| -5.845633414 | 1 | |
| -5.82995219 | 1 | |
| -5.756198292 | 1 | |
| -5.522454828 | 1 | |
| -5.215154372 | 1 | |
| -5.174046974 | 1 |
| Value | Count | Frequency (%) |
| 5.315728559 | 1 | |
| 4.842965736 | 1 | |
| 4.715903484 | 1 | |
| 4.646257329 | 1 | |
| 4.588374531 | 1 | |
| 4.550044682 | 1 | |
| 4.446508874 | 1 | |
| 4.248004011 | 1 | |
| 3.966186668 | 1 | |
| 3.947004253 | 1 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1000 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 760 | |
| 1 | 240 | 24.0% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 760 | |
| 1 | 240 | 24.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 760 | |
| 1 | 240 | 24.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1000 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 760 | |
| 1 | 240 | 24.0% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 760 | |
| 1 | 240 | 24.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 760 | |
| 1 | 240 | 24.0% |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1000 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 0 |
| 3rd row | 1 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 722 | |
| 1 | 278 | 27.8% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 722 | |
| 1 | 278 | 27.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 722 | |
| 1 | 278 | 27.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1000 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 722 | |
| 1 | 278 | 27.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 722 | |
| 1 | 278 | 27.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 722 | |
| 1 | 278 | 27.8% |
feat.c_c
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1000 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 773 | |
| 1 | 227 | 22.7% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 773 | |
| 1 | 227 | 22.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 773 | |
| 1 | 227 | 22.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1000 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 773 | |
| 1 | 227 | 22.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 773 | |
| 1 | 227 | 22.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 773 | |
| 1 | 227 | 22.7% |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1000 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 745 | |
| 1 | 255 | 25.5% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 745 | |
| 1 | 255 | 25.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 745 | |
| 1 | 255 | 25.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1000 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 745 | |
| 1 | 255 | 25.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 745 | |
| 1 | 255 | 25.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 745 | |
| 1 | 255 | 25.5% |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1000 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 670 | |
| 1 | 330 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 670 | |
| 1 | 330 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 670 | |
| 1 | 330 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1000 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 670 | |
| 1 | 330 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 670 | |
| 1 | 330 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 670 | |
| 1 | 330 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1000 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 671 | |
| 1 | 329 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 671 | |
| 1 | 329 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 671 | |
| 1 | 329 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1000 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 671 | |
| 1 | 329 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 671 | |
| 1 | 329 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 671 | |
| 1 | 329 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1000 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 659 | |
| 1 | 341 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 659 | |
| 1 | 341 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 659 | |
| 1 | 341 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1000 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 659 | |
| 1 | 341 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 659 | |
| 1 | 341 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 659 | |
| 1 | 341 |
Auto
The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| response | feat.a | feat.b | feat.d | feat.e | feat.f | feat.h | feat.i | feat.c_a | feat.c_b | feat.c_c | feat.c_d | feat.g_x | feat.g_y | feat.g_z | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | -0.681427 | -5.493698 | 0.0 | -0.800615 | -4.427602 | 10.254199 | -0.828073 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
| 1 | 1 | 0.309468 | -5.559933 | 1.0 | -1.155514 | -0.799094 | 9.084749 | -1.109698 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| 2 | 1 | 5.676125 | -4.026970 | 1.0 | -3.396331 | -0.631966 | 8.753848 | -3.417417 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 3 | 1 | 1.211525 | -4.198263 | 1.0 | -1.894569 | -16.273262 | 12.191295 | -1.904801 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
| 4 | 1 | 1.387863 | -7.824014 | 1.0 | 4.696980 | -22.208877 | 9.626686 | 4.715903 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 5 | 1 | 6.145195 | -2.439140 | 0.0 | -0.574830 | 11.642609 | 12.362962 | -0.521423 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 6 | 1 | 2.382749 | -3.625411 | 0.0 | 1.326984 | -4.148881 | 9.226122 | 1.287618 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
| 7 | 1 | -2.795184 | -0.375689 | 1.0 | -0.869053 | -2.994862 | 7.973038 | -0.839326 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 8 | 0 | -1.060559 | -2.972203 | 0.0 | 0.719649 | -15.543748 | 12.893124 | 0.718503 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
| 9 | 1 | -0.336986 | -4.670439 | 1.0 | -0.605454 | 3.060399 | 9.803020 | -0.548610 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
Last rows
| response | feat.a | feat.b | feat.d | feat.e | feat.f | feat.h | feat.i | feat.c_a | feat.c_b | feat.c_c | feat.c_d | feat.g_x | feat.g_y | feat.g_z | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 990 | 1 | 3.027287 | -5.645709 | 1.0 | -4.847993 | -8.070246 | 12.043355 | -4.894286 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
| 991 | 1 | -2.222620 | -2.611733 | 0.0 | 0.735233 | -2.876741 | 10.090726 | 0.816545 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| 992 | 0 | 2.363733 | -3.629801 | 0.0 | -5.109591 | -7.578162 | 9.301541 | -5.119936 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 993 | 0 | 0.360079 | -5.105157 | 0.0 | -1.393937 | -21.575596 | 10.537327 | -1.397865 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 994 | 0 | 1.939686 | -5.920013 | 0.0 | 0.098981 | -17.421105 | 11.705579 | 0.084855 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
| 995 | 0 | 0.730074 | -3.885035 | 0.0 | -3.356949 | -12.803344 | 11.204110 | -3.396673 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
| 996 | 1 | 4.211548 | -3.617253 | 0.0 | 2.034995 | 6.995753 | 9.208089 | 2.069752 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
| 997 | 1 | -3.053301 | -3.583830 | 1.0 | 1.929012 | -7.013105 | 7.637862 | 1.856356 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 998 | 1 | -0.567850 | -3.194716 | 1.0 | -1.849712 | 4.204816 | 11.725868 | -1.862466 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 999 | 1 | 0.252428 | -4.690728 | 1.0 | 1.742044 | -4.564031 | 7.909709 | 1.747037 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |